Modeling phrasing and prominence using deep recurrent learning

نویسندگان

Andrew Rosenberg

Raul Fernandez

Bhuvana Ramabhadran

چکیده

Models for the prediction of prosodic events, such as pitch accents and phrasal boundaries, often rely on machine learning models that combine a set of input features aggregated over a finite, and usually short, number of observations to model context. Dynamic models go a step further by explicitly incorporating a model of state sequence, but even then, many practical implementations are limited to a low-order finite-state machine. This Markovian assumption, however, does not properly address the interaction between shortand long-term contextual factors that is known to affect the realization and placement of these prosodic events. Bidirectional Recurrent Neural Networks (BiRNNs) are a class of models that overcome this limitation by predicting the outputs as a function of a state variable that accumulates information over the entire input sequence, and by stacking several layers to form a deep architecture able to extract more structure from the input features. These models have already demonstrated state-of-the-art performance on some prosodic regression tasks. In this work we examine a new application of BiRNNs to the task of classifying categorical prosodic events, and demonstrate that they outperform baseline systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosodic Prominence and Phrasing in Spoken Mandarin: The Case in the 3 tone

Prosodic prominence and phrasing has some general principle commonly existed in the world languages, but their execution is language-specific, especially in tone languages like Chinese. The kernel of the issue is that how the pitch manifestation of individual tones accommodate to achieve prominence and phrasing within certain prosodic domain. This paper tries to explore the peculiarity on pitch...

متن کامل

Prosodic Cues in Multimodal Speech Perception

Potential visual prosodic cues for prominence and phrasing comprising eyebrow movements were manipulated using a system for audio-visual text-to-speech synthesis which has been implemented based on the KTH rule-based synthesis. Two functions of prosody (prominence and phrasing) were tested in two separate experiments. A test sentence, ambiguous in terms of an internal phrase boundary, was used ...

متن کامل

Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition

A deep learning approach has been widely applied in sequence modeling problems. In terms of automatic speech recognition (ASR), its performance has significantly been improved by increasing large speech corpus and deeper neural network. Especially, recurrent neural network and deep convolutional neural network have been applied in ASR successfully. Given the arising problem of training speed, w...

متن کامل

Capturing Dependency Syntax with "Deep" Sequential Models

Neural network (“deep learning”) models are taking over machine learning approaches for language by storm. In particular, recurrent neural networks (RNNs), which are flexible non-markovian models of sequential data, were shown to be effective for a variety of language processing tasks. Somewhat surprisingly, these seemingly purely sequential models are very capable at modeling syntactic phenome...

متن کامل

A New Method for Detecting Ships in Low Size and Low Contrast Marine Images: Using Deep Stacked Extreme Learning Machines

Detecting ships in marine images is an essential problem in maritime surveillance systems. Although several types of deep neural networks have almost ubiquitously used for this purpose, but the performance of such networks greatly drops when they are exposed to low size and low contrast images which have been captured by passive monitoring systems. On the other hand factors such as sea waves, c...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Modeling phrasing and prominence using deep recurrent learning

نویسندگان

چکیده

منابع مشابه

Prosodic Prominence and Phrasing in Spoken Mandarin: The Case in the 3 tone

Prosodic Cues in Multimodal Speech Perception

Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition

Capturing Dependency Syntax with "Deep" Sequential Models

A New Method for Detecting Ships in Low Size and Low Contrast Marine Images: Using Deep Stacked Extreme Learning Machines

عنوان ژورنال:

اشتراک گذاری